Recently, An article was submitted to introduce Smaji CJKV. Several review comments were received suggesting appending some citations and prerequisite information to the article. These comments make sense. After all, the development of most disciplines and engineering is a continuous progression. Mostly, new development is built on the foundation of their predecessors.
Although more information was in demand, initially, I found it a bit difficult to append information. Because in the past two decades, Recording, encoding, and font designing of variant or rarely-used characters, all these techniques had been researched and developed but haven’t created much influence. Most of them are independent private systems that cannot be integrated into general systems. Some of them are relatively open but only open in user interface level, while others, relatively open and standard, are incompetent in infrastructure and inadequate in serving as the basis for subsequent development. These systems are not worth mentioning in references or prerequisite knowledge.
In 1999, Unicode’s own Ideographic Description Characters were introduced in Unicode 3.0. The sequence of that characters is called "Ideographic Description Sequence", i.e. IDS. It is naturally integrated into the daily-used general systems based on Unicode, has huge user base and is easy to use. For example, the word "时间" can be expressed as "⿰日寸" and "⿵门日" respectively. Even a character as complex as "𰻝" as seen in word "𰻝𰻝面", can also be expressed as "⿺辶⿳穴⿲月⿱⿲幺言幺⿲长马长刂心". At first glance, the functionalities are complete.
Click here to view 𰻝
But the problem is that when it comes to "丝", IDS cannot decompose it. Because Unicode does not include the character which looks like "幺" minus the last dot. Another example is "乔", with "夭" above and "?" below, which is also an uncollected character. Another example is the decomposition of the following characters: "与","乌","亇","争","亥","以"…
Because too many "components" or "roots" actually do not have characters corresponded with, and this system requires that their definition domains and value domains are all Unicode collected characters. Therefore, this system design was incomplete from the beginning: common or even daily-used characters may be out of the scope of describable.
Other private systems, which are aware of this problem, relaxed the restrictions on the definition domain and introduced private components. However, the composition of Chinese characters or components is diverse, and IDS and similar systems can only describe some ideal composition. A slightly less ideal one, such as "⿻", which means that two components overlap, is ambiguous. How exactly do they overlap, what is the direction of overlap, and what is the degree of overlap? No description at all. Thus, the glyph cannot be restored from the IDS. The result is yet another set of broken and incomplete systems.
However, the review comments also prompted me to think again, whether the efforts and legacies of the past are still valuable, or can they still be useful after being transformed and refined?
A general summary of the flaw of past explorations are listed as follows:
-
The domain of composite component is limited
-
IDS lacks accuracy
-
Being not universal or narrow in application scenarios
The solution is designed accordingly:
-
The domain of composite component is limited
The first step is to lift this restriction, and in a way that does not create new problems. Therefore, the following conditions must be met:
-
The domain is not only limited to Unicode included characters. Because of its incompleteness.
-
The defined base components must be able to composite any characters. Otherwise, it becomes another incomplete system.
-
Basic base components may not be added, deleted, or modified arbitrarily. To avoid causing failure and instability of the composition method.
Given these three requirements, it is expected that basic strokes are the ideal choice that meets all the above requirements. But what we need are not roughly the so-called basic five types of strokes, we need to enumerate at least 63 basic strokes, as well as mirror(left-right, up-down) and rotation operators. Because there are mirror characters and inverted characters in Chinese characters.
-
-
IDS lacks accuracy
The structure described by IDS conform to some patterns, that is, the components described are vertically centered (⿱, ⿳) or horizontally centered (⿰, ⿲) or fully wrapped (⿴) or three-sides-surrounded (⿵,⿶,⿷,) or two-adjacent-sides-surrounded (⿸,⿹,⿺,). The described components operated by these descriptors all form new shapes. For center-separated components, we only need to calculate the length or width and take the average, and each component can adjust the aspect ratio based on the average to obtain a new shape. If the structure is surrounded, the inter components are best-fitted to and scaled down a bit according to the outer component.
The descriptor ⿻ represents the description that two operands overlap with each other, which breaks the frame. Therefore, the shape of the components cannot be used as the basis for calculation in component arrangement. Besides, the descriptors(IDS) and operating components(strokes or roots) does not have any other intrinsical calculation basis, which leads to the inability of this description system.
Therefore, we have to introduce additional information to fill in the gaps. The shape of the components described by the separation or enclosing descriptor are preserved, so are the combination of the described components, and the outer frame box of the combined components is their outer frame. There are several kinds of data: the size and position of the outer frame, and the size and position of the components after being embedded in the outer frame. So finally we can get the position and size information of the components with the outer frame as the origin of the coordinate system.
After the descriptor ⿻ disables the component shape, the corresponding outer frame calculation cannot be performed, nor can the position and size information of the components. So, what we need to supplement is these two kinds of information with which the outer frame information can also be derived from the best-fit frame box.
To describe plane position and size information, we need to introduce a plane coordinate system.
The description of plane coordinates is a topic worth expanding on, and we will discuss it later. Now, let’s take a look at defect 3.
-
Being not universal or narrow in application scenarios
Unicode Character Set is required to be a standard information interchange set, so character components or roots must be selected from its own dataset. The basic components included in its own dataset has not covered the necessary essential components. Besides, the description capability of Unicode’s own Ideographic Description Characters (IDC) is incomplete. This resulted in defects 1, 2.
However, universities, technical groups, and commercial organizations other than Unicode Consortium had also tried to design or implement systems that are both Unicode compatible and of description capability complete. Most of them are close to be complete, and some are Unicode incompatible, few are perfect, thus limiting their application scenarios.
Another important reason is that the requirements for flexibility and real-time are difficult to reach. For example, a scholar once needs to quote excerpts from an ancient book, but in which several of the texts have multiple variations and are not included in the standard. Or an ancient book has been newly unearthed, and some characters that have not been seen before appear. It needs to be introduced into the standard and our computer system must be updated so that the characters can be encoded and displayed properly.
The above requirements require a long and possibly failing Unicode routine, which definitely will affect the progress of article writing.
The solution to this defect is given in Smaji CJKV, so I won’t go into details.
In fact, Smaji CJKV did not have a plan to design a glyph description system at the beginning. Only bitmap or vector images are allowed to be submitted. It became possible to design the describe system when the core system was set up and keep compatible with the Unicode system. The reviewers' suggestion for supplementary information mentioned earlier made me rethink my past experience, and then the design the glyph description language was started.
Well, let’s solve the problems skipped before:
-
IDS lacks accuracy
The idea and method to solve this problem require more space to describe, so the following subsection is added.
Glyph Outline Description Language
Because the standard form of this language is xml document, an XML Schema Definition is most suitable to describe it. The following is the very syntax description document god.xsd of this language.
Create XML document
An XML document consists of an optional XML declaration, an optional document type declaration, and a document (root) element.
The version declaration of an xml ensures that future XML changes will not affect the syntax and semantics of this document. The encoding declaration tells the XML processor the encoding used by this document. The XML version used by the GOD 1.0 document is 1.0, and the encoding is UTF-8
. So its XML The encoding header is certain:
<?xml version="1.0" encoding="UTF-8"?>
Because the xml version defaults to 1.0, and the default available encoding can be UTF-8
or UTF-16
, the declaration header above is not necessary.
1
2
3
4
5
6
7
8
9
10
<?xml version="1.0"?>
<god version="1.0"
xmlns="http://cjkv.smaji.org/ns/god"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://cjkv.smaji.org/ns/god http://cjkv.smaji.org/xml/1.0/xsd/god.xsd">
<glyph unicode="516b,0">
<stroke type="t" x="0" y="0" width="56" height="112"/>
<stroke type="p" x="76" y="0" width="56" height="112"/>
</glyph>
</god>
The first line is an optional XML declaration.
Lines 2 and 10 start and end a god
root element. The root element is mainly used to indicate the version of this god
document. The version attribute in the second line indicates that this god
document adopts the syntax and semantics of version 1.0.
The fourth and fifth lines are optional and are used to introduce the XSD description of this god
document so that capable text editors can use it to verify the correctness of the god
document being edited and provide suggestions such as auto-completion.
The next child element is glyph
. It contains a required attribute unicode
, used to indicate the unicode
scalar of the glyph described in this god
document. Its value is a hexadecimal number representing a unicode
scalar, and after the number, a value called variation selector
can be appended separating by a comma. In the example, the value of the unicode
property is 516b, which is the unicode
scalar of the Chinese character 「八」.
「八」 consists of two strokes, the first stroke is a throw (撇), and the second stroke is a press (捺), so in the glyph
element, we add two sub-elements, namely stroke t
(撇) and stroke p
(捺). And in the coordinate system, the position, width, and length information of each stroke is given. For more information of the stroke type in god
. Please consult the god.xsd
file.
The following table is an excerpt from god.xsd
for reference.
Click here to view an excerpt from the god.xsd
h | Horizontal sh | Slanted Horizontal u | Upward horizontal du | Dot – Upward horizontal v | Vertical sv | Slanted Vertical rsv | Right Slanted Vertical t | Throw ft | Flat Throw wt | Wilted Throw d | Dot ed | Extended Dot ld | Left Dot wd | Wilted Dot p | Press up | Upward horizontal – Press hp | Horizontal – Press fp | Flat Press ufp | Upward horizontal – Flat Press c | Clockwise curve a | Anticlockwise curve o | Oval hj | Horizontal – J hook uj | Upward horizontal – J hook ht | Horizontal – Throw hsv | Horizontal – Slanted Vertical hv | Horizontal – Vertical hvj | Horizontal – Vertical – J hook htj | Horizontal – Throw – J hook utj | Upward horizontal – Throw – J hook hvh | Horizontal – Vertical – Horizontal hvu | Horizontal – Vertical – Upward horizontal ha | Horizontal – Anticlockwise curve haj | Horizontal – Anticlockwise curve – J hook hpj | Horizontal – Press – J hook htaj | Horizontal – Throw – Anticlockwise curve – J hook htc | Horizontal – Throw – Clockwise curve htht | Horizontal – Throw – Horizontal – Throw htcj | Horizontal – Throw – Clockwise curve – J hook hvhv | Horizontal – Vertical – Horizontal – Vertical hthtj | Horizontal – Throw – Horizontal – Throw – J hook vu | Vertical – Upward horizontal vh | Vertical – Horizontal va | Vertical – Anticlockwise curve vaj | Vertical – Anticlockwise curve – J hook vhv | Vertical – Horizontal – Vertical vht | Vertical – Horizontal – Throw vhtj | Vertical – Horizontal – Throw – J hook vj | Vertical – J hook vc | Vertical – Clockwise curve vcj | Vertical – Clockwise curve – J hook tu | Throw – Upward horizontal th | Throw – Horizontal td | Throw – Dot wtd | Wilted Throw – Dot tht | Throw – Horizontal – Throw thtj | Throw – Horizontal – Throw – J hook tj | Throw – J hook cj | Clockwise curve – J hook fpj | Flat Press – J hook pj | Press – J hook thtaj | Throw – Horizontal – Throw – Anticlockwise curve – J hook tod | Throw – Oval – Dot
Click here to view the corresponding graphics
Stroke | Chinese name | Abbr form | Full name | Name in Unicode | Example |
---|---|---|---|---|---|
橫 |
H |
Horizontal |
H |
三 言 隹 花 |
|
斜橫 |
SH |
Slanted Horizontal |
(H) |
七 弋 宅 戈 |
|
挑 |
U |
Upward horizontal |
T |
刁 求 虫 地 |
|
點挑 |
DU |
Dot – Upward horizontal |
(T) |
冰 冷 汗 汁 |
|
豎 |
V |
Vertical |
S |
十 圭 川 仆 |
|
斜豎 |
SV |
Slanted Vertical |
(S) |
丑 五 亙 貫 |
|
右斜豎 |
RSV |
Right Slanted Vertical |
(S) |
𠙴 |
|
撇 |
T |
Throw |
P |
竹 大 乂 勿 |
|
扁撇 |
FT |
Flat Throw |
(P) |
千 乏 禾 斤 |
|
直撇 |
WT |
Wilted Throw |
SP |
九 厄 月 几 |
|
點 |
D |
Dot |
D |
主 卜 夕 凡 |
|
長點 |
ED |
Extended Dot |
(D) |
囪 囟 这 凶 |
|
左點 |
LD |
Left Dot |
(D) |
心 忙 恭 烹 |
|
直點 |
WD |
Wilted Dot |
(D) |
六 文 宇 空 |
|
捺 |
P |
Press |
N |
人 木 尺 冬 |
|
挑捺 |
UP |
Upward horizontal – Press |
TN |
文 廴 父 爻 |
|
橫捺 |
HP |
Horizontal – Press |
(TN) |
入 八 內 全 |
|
扁捺 |
FP |
Flat Press |
(N) |
走 足 廴 麵 |
|
挑扁捺 |
UFP |
Upward horizontal – Flat Press |
(TN) |
之 乏 巡 迴 |
|
彎 |
C |
Clockwise curve |
W |
||
曲 |
A |
Anticlockwise curve |
X |
||
圈 |
O |
Oval |
Q |
〇 㔔 㪳 㫈 |
|
橫鈎 |
HJ |
Horizontal – J hook |
HG |
冧 欠 冝 蛋 |
|
挑鈎 |
UJ |
Upward horizontal – J hook |
(HG) |
也 乜 池 馳 |
|
橫撇 |
HT |
Horizontal – Throw |
HP |
夕 水 登 令 |
|
橫斜 |
HSV |
Horizontal – Slanted Vertical |
(HP) |
今 彔 互 恆 |
|
橫豎 |
HV |
Horizontal – Vertical |
HZ |
口 己 臼 典 |
|
橫豎鈎 |
HVJ |
Horizontal – Vertical – J hook |
HZG |
而 永 印 令 |
|
橫撇鈎 |
HTJ |
Horizontal – Throw – J hook |
(HZG) |
勺 方 力 母 |
|
挑撇鈎 |
UTJ |
Upward horizontal – Throw – J hook |
(HZG) |
也 乜 池 馳 |
|
橫豎橫 |
HVH |
Horizontal – Vertical – Horizontal |
HZZ |
凹 兕 卍 雋 |
|
橫豎挑 |
HVU |
Horizontal – Vertical – Upward horizontal |
HZT |
殼 鸠 说 计 |
|
橫曲 |
HA |
Horizontal – Anticlockwise curve |
HZW |
朵 沿 殳 没 |
|
橫曲鈎 |
HAJ |
Horizontal – Anticlockwise curve – J hook |
HZWG |
九 几 凡 亢 |
|
橫捺鈎 |
HPJ |
Horizontal – Press – J hook |
(HZWG) |
風 迅 飛 凰 |
|
橫撇曲鈎 |
HTAJ |
Horizontal – Throw – Anticlockwise curve – J hook |
HXWG |
乙 氹 乞 乭 |
|
橫撇彎 |
HTC |
Horizontal – Throw – Clockwise curve |
--- |
過 过 這 这 |
|
橫撇橫撇 |
HTHT |
Horizontal – Throw – Horizontal – Throw |
HZZP |
延 建 巡 及 |
|
橫撇彎鈎 |
HTCJ |
Horizontal – Throw – Clockwise curve – J hook |
HPWG |
陳 陌 那 耶 |
|
橫豎橫豎 |
HVHV |
Horizontal – Vertical – Horizontal – Vertical |
HZZZ |
凸 𡸭 𠱂 𢫋 |
|
橫撇橫撇鈎 |
HTHTJ |
Horizontal – Throw – Horizontal – Throw – J hook |
HZZZG |
乃 孕 仍 盈 |
|
豎挑 |
VU |
Vertical – Upward horizontal |
ST |
卬 氏 衣 比 |
|
豎橫 |
VH |
Vertical – Horizontal |
SZ |
山 世 匡 直 |
|
豎曲 |
VA |
Vertical – Anticlockwise curve |
SW |
區 亡 四 匹 |
|
豎曲鈎 |
VAJ |
Vertical – Anticlockwise curve – J hook |
SWG |
孔 已 亂 也 |
|
豎橫豎 |
VHV |
Vertical – Horizontal – Vertical |
SZZ |
鼎 亞 吳 卐 |
|
豎橫撇 |
VHT |
Vertical – Horizontal – Throw |
(SZZ) |
奊 捑 𠱐 𧦮 |
|
豎橫撇鈎 |
VHTJ |
Vertical – Horizontal – Throw – J hook |
SZWG |
弓 弟 丐 弱 |
|
豎鈎 |
VJ |
Vertical – J hook |
SG |
小 水 到 寸 |
|
豎彎 |
VC |
Vertical – Clockwise curve |
SWZ |
肅 嘯 蕭 瀟 |
|
豎彎鈎 |
VCJ |
Vertical – Clockwise curve – J hook |
--- |
𨙨 𨛜 𨞠 𨞰 |
|
撇挑 |
TU |
Throw – Upward horizontal |
PZ |
去 公 玄 鄉 |
|
撇橫 |
TH |
Throw – Horizontal |
(SZ) |
互 母 牙 车 |
|
撇點 |
TD |
Throw – Dot |
PD |
巡 兪 巢 粼 |
|
直撇點 |
WTD |
Wilted Throw – Dot |
(PD) |
女 如 姦 㜢 |
|
撇橫撇 |
THT |
Throw – Horizontal – Throw |
(SZZ) |
夨 𠨮 专 砖 |
|
撇橫撇鈎 |
THTJ |
Throw – Horizontal – Throw – J hook |
(SZWG) |
巧 亟 污 號 |
|
撇鈎 |
TJ |
Throw – J hook |
PG |
乄 |
|
彎鈎 |
CJ |
Clockwise curve – J hook |
WG |
狗 豸 豕 象 |
|
扁捺鈎 |
FPJ |
Flat Press – J hook |
BXG |
心 必 沁 厯 |
|
捺鈎 |
PJ |
Press – J hook |
XG |
弋 戈 我 銭 |
|
撇橫撇曲鈎 |
THTAJ |
Throw – Horizontal – Throw – Anticlockwise curve – J hook |
--- |
𠃉 𦲳 𦴱 鳦 |
|
撇圈點 |
TOD |
Throw – Oval – Dot |
--- |
𡧑 𡆢 |
After being processed by the glyph outline generation program provided by Smaji CJKV, the following outline file is generated, which can be used in a font editor.
In god
, strokes are used to form glyphs, so are the existing characters. For example, the character "丕" can be composed of the character "不" plus "一".
1
2
3
4
5
6
7
<?xml version="1.0"?>
<god version="1.0" xmlns="http://cjkv.smaji.org/ns/god">
<glyph unicode="4e15,0">
<ref unicode= "4e0d" x="0" y="0" width="128" height="120"/>
<stroke type="h" x="0" y="114" width="128" height="14"/>
</glyph>
</god>
Of course, although using unicode scalar directly is accurate, typing in a character instead is also a very good choice for commonly used and unambiguous characters. The god
file above can also be rewritten into the following form. Change line 4 to
<character utf8= "不" x="0" y="0" width="128" height="120"/>
Get the following god
file
1
2
3
4
5
6
7
<?xml version="1.0"?>
<god version="1.0" xmlns="http://cjkv.smaji.org/ns/god">
<glyph unicode="4e15,0">
<character utf8= "不" x="0" y="0" width="128" height="120"/>
<stroke type="h" x="0" y="114" width="128" height="14"/>
</glyph>
</god>
The following glyph outlines can be produced:
Let’s take a look at another glyph outline:
Doesn’t it look like "了" turned upside down? Indeed, in Chinese characters, there are left-right mirror characters, up-down mirror characters, and rotated characters. The character illustrated is a rotating one. So how does it described in god
?
1
2
3
4
5
6
<?xml version="1.0"?>
<god version="1.0" xmlns="http://cjkv.smaji.org/ns/god">
<glyph unicode="2010f,0" transform="rotate180">
<character utf8="了" x="0" y="0" width="88" height="128" />
</glyph>
</god>
One of the design concepts in god
is that for Chinese characters after Liding(隶定) and Libian(隶变), their composition is a combination of basic components and strokes, rather than the manipulation of basic components and strokes. Therefore, mirroring or rotating operations only work on the characters as a whole.
Therefore, we can add transform
attribute to the glyph
element and
-
mirror_horizontal
-
mirror_vertical
-
rotate180
are given to choose from as the attribute’s value to indicate the transition.
Because the glyph of unicode
2010f is exactly the rotation of the character "了". So in this god
file, the 6th line indicates that the transform
attribute is rotate180
, and the 7th line directly introduces the glyph of the character "了" as the basis. That is, the required glyph is obtained.
Smaji CJVK support for GOD
Smaji Glyph Outline
An OCaml library for reading, exporting, and converting glyph outline data and files.
Supported glyph outline formats are:
-
SVG, Scalable Vector Graphics. It is extremely widely used and supports an unusually rich range of vector graphics formats.
-
GLIF, Glyph Interchange Format. for Unified Font Object
Smaji God
An OCaml library for reading, processing, and exporting GOD documents.
Smaji DynGlyph
An executable program that allows users to generate font outline files from GOD
documents, and the outline files can be used to generate fonts. In addition, users can also use this program to generate stroke animation files for demonstration.
Smaji DynGlyph Collection
A git repository that stores sample basic stroke libraries used by the dyn-glyph program, as well as a collection of GOD documents submitted by users.
Online God Editor
Edit online, submit god files, and generate svg outline files or animation files.
Comments
comments powered by Disqus